The following information is intended as a guide to performance tuning
and configuration for the C.03.00 OTS stack.
This section contains the following parts :
A. Factors That Influence OTS Performance
Terminology and Concepts
Performance Factors
Use of Performance Numbers
B. OTS Performance Test Model
C. OTS Performance
Throughput Rate
Request-Reply Performance
Performance Tuning
A. Factors That Influence OTS Performance
There are many factors that influence overall performance of a network
application. The purpose of this section is to define some common terms
and to describe (loosely) their interrelationship with respect to
performance.
A-1) Terminology and Concepts
o A TSDU (Transport Service Data Unit) is the logical unit of information transmitted between peer transport service, or XTI, users. A TPDU (Transport Protocol Data Unit) is the physical unit data transmitted between peer transport entities across the network. One or more TPDUs are used to transmit one TSDU. The "window" size determines the number of TPDUs that the transport entity can transmit before waiting for an acknowledgment from the peer entity.
A TIDU (Transport Intermediate Data Unit) is the physical unit of data passed internally between the application and the transport entity (OTS) on the same system.Typically, one or more TIDUs are used to pass one TSDU. For the XTI send (t_snd) and receive (t_rcv) functions, the T_MORE flag is used to indicate whether or not the TSDU segment is the last part of the complete TSDU.
o An SSDU (Session Service Data Unit) is the logical unit of information transmitted between peer session service users. The session entities transmit information in the form of SPDUs (Session Protocol Data Units) using the transport service. A session entity may transmit one or more SPDUs in a single TSDU.
The SSDU size for session version 2 normal data transfer is unlimited.
OTS segments an SSDU of normal data into SPDUs based on the underlying
TPDU size.
o An PSDU (Presentation Service Data Unit) is the logical unit of information transmitted between peer presentation service, or APLI, users. The presentation entities transmit information in the form of PPDUs (Presentation Protocol Data Units) using the session service. A presentation entity may transmit one or more PPDUs in a single SSDU.
NOTE: We have simplified the conceptual model by stopping at the Transport layer. Actually each TPDU is segmented into one or more Network layer packets; and for X.25, each Network packet is segmented into one or more Data Link layer frames. The Data Link frame is the physical unit of data transmitted between systems.
In OTS, applications run in user space, and the protocol entities are in the kernel, under Streams. When an application issues an open(2) system call to open a Streams device, a pair of Streams queues is created in the kernel (one for sending, and one for receiving). The XTI, Session, and APLI libraries use the Streams system calls putmsg(2) and getmsg(2) to communicate using TIDUs, SIDUs,and PIDUs, respectively, with the OTS protocol stack.
The maximum TPDU and window sizes are parameters of the OTS stack. The default maximum TPDU and window sizes are user-configurable, but these can be negotiated downward during connection establishment with a peer Transport entity.
The TSDU size is controlled by the XTI application. If the TSDU size exceeds the maximum TIDU size, the XTI library will segment the TSDU into multiple TIDUs.
In this section, we will assume that one TIDU is used for each TSDU for an XTI application. That is, the performance tests ensure that the TSDU is not so large that it must be segmented into multiple TIDUs. Consequently, in discussing the interactions between an application and the OTS stack, we will speak only of TSDUs, even though TIDU might be a more appropriate term.
When discussing the performance of the XTI, Session, or APLI application below, we will often use the term SDU to refer to a TSDU, SSDU, or PSDU, respectively. SDU (Service Data Unit) is meant to be a generic term for the unit of data seen by an application.
A-2) Performance Factors
Larger TSDUs require fewer transfers between the application and the OTS stack. This results in fewer context switches and lower system overhead. For example, if an application is sending 100K bytes in a file transfer, this requires 100 transfers if TSDU size is set to 1024; but it requires only 25 transfers if the TSDU size is 4096.
Similarly, larger TPDUs require fewer transfers between peer transport entities across the network. Usually this results in better throughput, due to decreased bandwidth used for protocol headers and acknowledgements. For example, if the OTS stack transmits a 2000 byte TSDU, this requires 17 transfers if the TPDU size is set to 128; but it requires only 1 transfer if the TPDU size is 2048.
Larger window sizes allow the sending system to transmit more TPDUs before waiting for the receiving system to acknowledge their arrival. This increases the concurrent operation of the two systems.
Thus, we might expect that large TSDUs, large TPDUs, and large windows would maximize throughput. However, extremes in these directions can have a negative impact on performance because of other system and network factors.
For example, if we use a large window size and the receiving system is much slower than the sender, the sender is likely to overrun the LAN interface on the receiving system, and TPDUs will be dropped (lost). This will result in Transport retransmissions. Also, a lot of kernel memory may be consumed in the systems. When multiplied over a large number of connections, this can result in overall system performance degradation.
Retransmissions can have a significant impact on effective performance for two reasons. First, because the timer resolution is one second, even one retransmission on a 10 Mbit/sec LAN can devastate throughput, even for transfers of a megabyte of data. Second, the sender must retransmit all of the unacknowledged TPDUs; typically, this is an entire window. For larger window and TPDU sizes this means retransmitting more data.
Large TPDU sizes, whether you are using XTI, the Session Layer Access, or APLI, also causes the stack to take up more memory for PDUs in internal protocol and Streams queues. This can degrade the performance of the entire system if there are large number of OTS connections sending and/or receiving data.
Consequently, a delicate balance must be found among these parameters in order to maximize performance. Unfortunately, finding this balance is not an exact science. Experimentation with each situation is required.
B. OTS Performance Test Model
The Following important points have to be noted about the performance
measurements below:
- Measurements made on an series 842 were done when it was communicating with an 827. Therefore whether the 842 was the sender or the receiver, it was bottlenecked by its own speed (and the link, of course). Similarly, measurements of the 720 and 827 were done against the series 750, except where noted.
- Except where noted, all the stack parameters were set to the default values.
- Only unidirectional data was considered while measuring throughput.
- All X.25 measurements were made using an Amnet switch with the line speed set to 64 Kbps.
- All the LAN measurements were made on an isolated LAN segment.
- Superfluous processes running in the system were killed during the measurements (notably, the NFS daemons, syncer, cron, sendmail, etc.).
- In all the measurements over LAN, the througput rate was best when the SDU size used was a little under 4096. The lone exception was the XTI receiver. The XTI receiver showed a throughput of 704 Kbytes/Sec, utilizing 49.5 % of an 842 at an SDU size of 3840.
- In all the measurements over X.25, the throughput rate was best when the SDU size used was 2304 bytes (for the 842), or when the SDU size was a little under 4096 (for the 827 and 750).
- All measurements were made only after the connection was established
(to factor out connection establishment costs).
- All Request-reply measurements were made with the Requester sending 'n' bytes as a request, and the reply being a fixed 100 bytes in size. Data has been given for different values of the request size.
Performance of a customer's network will depend upon the stack
parameters, the number of systems in the LAN (LAN traffic), the
number of subnets configured, the amount of physical memory
available, and the SDU sizes used.
For example, if large number of connections are in operation,
all of them pumping lot of data into the network, it may be
possible that some PDUs are dropped due to buffer shortages, and
this can result in retransmissions, thereby affecting
performance. Even if a lot of other tasks (unrelated to
OTS) are being run on the system, the performance of OTS can
degrade considerably because of contention for CPU and memory
between OTS and the other tasks.
Using SDU sizes higher than 4096 bytes may result in slightly
higher performance, but the CPU utilization figures may increase
considerably. However, using SDU sizes higher than 4096 in
combination with running large number of connections can cause
(depending on how much data is handled by each connection)
performance degradation of the entire system, as it takes up a
lot of kernel memory.
Measured performance can also be affected by the total amount of
data transferred during the lifetime of a connection. This is
because there is a fixed overhead in connection establishment,
congestion avoidance algorithms used by Transport, delayed
acknowledgement, etc.
C. OTS Performance
C-1) Throughput Rate:
Figure 4 gives the peak observed throughput using XTI, Session
and APLI programmatic access methods over 802.3. Figure 5
gives the throughput over X.25.
All measurements were made on an 842 communicating with a 827.
CPU Utilization measurements indicate that of an 842.
Figure 5: Peak Throughput measurements over X.25 (1 connection)
C-2) Request-Reply Performance
Figures 6 and 7 show the request-reply performance of an 842 communicating with an 827 over 802.3 and X.25. Transactions per second and CPU utilization are shown for two SDU sizes, a "small" 100 byte SDU, and a "large" 3840 byte SDU.
Figure 7: Request-Reply measurements over X.25 (1 conection)
C-3) Performance Tuning
This following data provides guidance in configuring OTS for optimal performance. OTS througput and request-response performance are compared for OTS running on different processors and network interfaces. The effect of the TP4 checksum calculation on performance is measured, and throughput measurements are taken for varying TPDU and window sizes.
Figure 8 shows the relative performance of the 842, 827, 720, and 750 over 802.3 using XTI. Measurements were taken with an 842 sending data to an 827, the 827 sending to a 750, and the 720 and 750 sending to each other.
In general, the relative performance of the other APIs to XTI should be similar to their relative performance as measured on the 842. The performance of machines other than those shown here can best be estimated by using data for a machine with a similar processor and I/O architecture. For instance comparisons can be made between the 827 and other members of the 8x7 family of machines, or between the 720 and 750, by scaling the CPU utilization by processor speed.
Figure 9: Send throughput for systems over XTI/X.25 (1 connection)
Figures 10 and 11 show a comparison of different systems for request-reply traffic. As before the machines were paired as 842-827, 827-750, 720-750, 750-720. Request-reply traffic is more dependent than throughput on the speed of the partner machine, therefore, the raw number of transactions per second is not as important as the amount of CPU power needed to process each transaction.
Figure 11: Request-Reply measurements for systems over XTI/X.25
(1 connection)
The effect of using the optional TP4 checksum is measured in Figure 12, tested between two 827 machines running an XTI application. The checksum calculation takes an additional 31-53% CPU utilization, depending on whether the receiver or sender is considered, while delivering 11% less throughput.
-------------------------------------------
| Thruput | CPU Util | CPU Util | SDU |
| (Kb/sec) | Sender | Receiver | |
-------------------------------------------
with checksum | 821.4 64.0 % 79.6 % 3840 |
without checksum | 927.0 41.8 % 60.6 % 3840 |
|-----------------------------------------|
Figure 12: Effect of TP4 checksum on XTI throughput over 802.3
Varying the TPDU and/or transport window size is another way of trying to tune performance. Setting a larger TPDU size will allow OTS to operate more efficiently, handling larger chunks of data at a time. However, this may be offset by the increased penalty for retransmissions, depending on the reliability and congestion on the LAN. Also, a larger TPDU or window size increases the number of kernel buffers which are needed to store data which is pending acknowledgement or is waiting for delivery to the application.
The data shows the results of varying the TPDU size using the default window size, then varying the window size with the default TPDU size. All measurements were taking over XTI between two 827s. No measurements are shown using an 8192 byte TPDU, because in the Streams environment, XTI TSDUs of greater than 4 KB will be fragmented into smaller TIDUs. The performance for TPDU of 8192 bytes will be similar to that shown for 4096.
Figure 13: Effect of varying TPDU and window size on XTI throughput over 802.3
Notice that performance increases significantly with increasing TPDU sizes. The best throughput seems to occur with a window size of 12, although the differences are not so dramatic. Again, these measurements were taken under ideal conditions over an isolated LAN.
Figure 14 shows the same measurements taken over X.25 with the default CONS TPDU and window sizes. The best throughput comes with a TPDU size of 256 bytes, although this comes at the cost of some additional CPU cycles. Also, note that the performance only improves slightly with increasing window size. This confirms our expectation that X.25 performance is limited by the speed of the X.25 interface and the X.25 subnetwork, and not by processing or protocol delays in the OSI stack.
Below is a list of all the available transport and session PICS for OPUSk. Currently there is no automated way to retrieve these since we currently are limited to paper copies. Please send requests for OTS PICS to Jean-Yves RIGAULT at ENMC Grenoble.